A Hybrid Unsupervised Density-based Approach with Mutual Information for Text Outlier Detection

نویسندگان

چکیده

The detection of outliers in text documents is a highly challenging task, primarily due to the unstructured nature and curse dimensionality. Text document refer data that deviates from found other belonging same category. Mining has wide applications various domains, including spam email identification, digital libraries, medical archives, enhancing performance web search engines, cleaning corpora used classification. To address issue dimensionality, it crucial employ feature selection techniques reduce large number features without compromising their representativeness domain. In this paper, we propose hybrid density-based approach incorporates mutual information for outlier detection. proposed utilizes normalized identify most distinct characterize target Subsequently, customize well-known local factor algorithm suit datasets. evaluate effectiveness approach, conduct experiments on synthetic real datasets comprising twelve high-dimensional results demonstrate consistently outperforms conventional methods, achieving an average improvement 5.73% terms AUC metric. These findings highlight remarkable enhancements achieved by leveraging conjunction with algorithm, particularly

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Local Density-Based Approach for Local Outlier Detection

This paper presents a simple but effective density-based outlier detection approach with the local kernel density estimation (KDE). A Relative Densitybased Outlier Score (RDOS) is introduced to measure the local outlierness of objects, in which the density distribution at the location of an object is estimated with a local KDE method based on extended nearest neighbors of the object. Instead of...

متن کامل

Outlier Detection in Dataset using Hybrid Approach

Outlier is a data point that deviates too much from the rest of dataset. Most of real-world dataset have outlier. Outlier analysis is one of the techniques in data mining whose task is to discover the data which have an exceptional behavior compare to remaining dataset. Outlier detection plays an important role in data mining field. Outlier Detection is useful in many fields like Medical, Netwo...

متن کامل

RODHA: Robust Outlier Detection using Hybrid Approach

The task of outlier detection is to find the small groups of data objects that are exceptional to the inherent behavior of the rest of the data. Detection of such outliers is fundamental to a variety of database and analytic tasks such as fraud detection and customer migration. There are several approaches[10] of outlier detection employed in many study areas amongst which distance based and de...

متن کامل

Outlier Detection for Text Data

The problem of outlier detection is extremely challenging in many domains such as text, in which the attribute values are typically non-negative, and most values are zero. In such cases, it often becomes difficult to separate the outliers from the natural variations in the patterns in the underlying data. In this paper, we present a matrix factorization method, which is naturally able to distin...

متن کامل

Intrusion Detection based on a Novel Hybrid Learning Approach

Information security and Intrusion Detection System (IDS) plays a critical role in the Internet. IDS is an essential tool for detecting different kinds of attacks in a network and maintaining data integrity, confidentiality and system availability against possible threats. In this paper, a hybrid approach towards achieving high performance is proposed. In fact, the important goal of this paper ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International journal of intelligent systems and applications

سال: 2023

ISSN: ['2074-904X', '2074-9058']

DOI: https://doi.org/10.5815/ijisa.2023.05.04